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ABSTRACT 

The fraction of faults detected for a digital network is frequently high 
for the first few input combinations applied out of a set of test vectors. 

When the particular ordering of test patterns does not appreciably change the 
shape of the coverage curve, there appears to be an advantage to splitting the 
test into segments which are applied at different times. It is shown that the 
expected time to error detection arid the probability of an undetected double 
error can be reduced. The amount of reduction is dependent on the shape of 
the fault coverage curve. It is conjectured that such a reduction can be 
obtained for VLSI networks. 
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INTRODUCTION 


The shape of a fault coverage curve; i.e. the cumulative fraction of 
detected faults as a function of the number of tests applied, is often a 
saturated curve. Recent work on faul i-tolerant computing (1,2) and built in 
test (3) have reported such situations. This characteristic is often found in 
testing situations; fault table minimization, D algorithm, path sensitizing, 
random test, etc. (4). 

The basic strategy proposed in this paper is to split up a test set into 
two or more segments; then apply these shorter tests at differed times so that 
there is a reduced time between periods of testing. If each test segment 
detects many or most faults then it is expected that faults should be detected 
sooner on the average. Consider the 15 gate combinational logic circuit in 
Figure 1. The function realized is a 3 out of 5 select followed by a majority 
vote. Three ones on the E inputs select three T lines out of the five using 
gates 1 to 5; gates 6 to 11 OR the selected lines in various pairs which are 
ANDed at gates 12 to 14. Gate 15 ORs these products to realize the majority 
of the three selected T inputs. This function was implemented in a Rockwell- 
Collins gate array as part of a VLSI project. Suppose the possible faults are 
single gate input or output stuck-at faults. There are 3 leads each for gates 
1 to 14 and 4 leads for gate 15. The 46 total pins result in 92 single 
faults. Since there are no reconvergent fan-out paths of differing inversion 
parity, expanding and contracting faults can be considered separately. We 
start with tests for expanding faults. It is easy to show that the following 
4 tests will detect all combinations of gate input or gate output stuck faults 
which Increase the number of ones in the function. 
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Tjj E - 11100 
T - 01011 

Tj2 E - 00111 
T » 11100 

T^3 E = OlOll 
T = 11100 


( 1 ) 


E - 11100 
T - 00111 

Any single test from set (1) will detect 27 of the 46 expanding stuck 
faults while any two consecutive tests (including will detect 44 of 

the 46. 

For contracting faults the following six tests will detect gate input or 
output stuck faults. 

Tqj E = T = 10010 
Tq 2 E - T = 00101 
Too E » T » 01001 

( 2 ) 

Tq^ E * T - 01010 
E = T - 11000 
Tq6 E » T - 00110 

The first 3 tests detect 40 out of the 46 contracting faults. 
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Suppose we interlace sets (1) and (2) as follows: 

'^Ol* *^02‘ '^03» '^13* '^04» '^05* ’’06 

It Is easy to count the number of single stuck faults detected by (3) as 
individual patterns are applied. The resulting fault coverage curve is given 
as the solid line in Figure 2. Rotating (3) and considering other initial 
test patterns gives the other two curves in Figure 2. 

Next we compare two different methods of applying set (3): 

1, Complete Test: The entire set (3) is applied every I time units, 

2. Segmented Test: The first 5 tests of (3) every odd multiple of 1/2 

time units and the last 5 tests of -(3) every eUen multiple of 1/2. 

In the complete case any single fault in the first 1 time units is 
detected by the test at time I. We ignore faults which occur during the 
testing for the moment. This assumption does not significantly change the 
result. Suppose further that the fault process is stationary. The expected 
time to fault detection ETFD is thus half the interval or 1/2. 

For the segmented case consider single faults that occur in the first 
Internal of 1/2. Some faults are missed by the five tests at 1/2 and are not 
detected until the end of the next 1/2 segment by the remaining five tests. 

For this example it turns out that the same fraction of faults in the second 
subinterval are missed by the even set. 

The faults detected by the odd set will have an average time to detection 
of half the subinterval or 1/4. The II faults missed by the odd set will have 
their time to detection increased by 1/2, the time between the odd and even 


subtests 







Thus we can express the expected time to fault detection for this 


segmented test as 


ETFD 


1 II + fi 4 . li 
T 92 ^ U ^ 2-* 92 


1 ^ 

2 92 


( 4 ) 


The second factor of (4) is a factor by which we have reduced the ETFD by 
partitioning the complete test into two segments. This reduction is bought. at 
a cost of testing twice as often. The overhead in shifting to testing mode 
has been doubled. 

Another measure of testing goodness applicable to fault tolerant system 
is the probability of an undetected double error. The idea is that a single 
fault tolerant system can either adapt or flag the rest of the world when a 
single error is detected yet still compute correctly. Undetected double 
errors on the other hand might lead to overall system failure (1, 2). We 
assume a Poisson fault process (5) to estimate the probability of an 
undetected double error. 

For a Poisson process with rate X, the probability of exactly k 
occurrences in time interval t is 


p(k,t) 


-Xt (Xt)^ 
® k! 


(5) 


“"A t 

To simplify the example we assume that Xt is very small so that e can be 
treated as the value 1. This assumption does not appreciably change the 
character of the results. Expression (5) thus is reduced to 
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P(k,t) 


(Xt)*" 

TT~ . 


( 6 ) 


Also we note that pCk^t) is much larger than p(k+l,t), i.e. single errors are 
much more likely than double errors which in turn are much more likely than 
triple errors, etc. 

Consider P 21 the probability of an undetected double error in an interval 
I between complete test sets* For L successive frames of I this probability . 
is approximately 


(XI)^ 
__ . 


(7) 


Expression (7) assumes that LXI is -small compared to 1. 

Next consider the complete test divided in two and applied every 1/2 time 
units. Double errors within the shorter interval 1/2 are undetected. In 
addition some single errors in the first Interval are undetected after the 
partial test at 1/2. If such an undetected fault is followed by another fault 
in the next interval before time I then an undetected double error has 
occured. In a similar fashion there are some single errors between time 1/2 
and I which when followed by another single error in the next half interval 
result in an undetected double error. Note that these cases are in essence 
the same as those in the discussion leading to expression (4). Adding the 
probability of these three situations for L frames of I we find a total 
probability 




L-1 

L 92^ * 


( 8 ) 
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Replacing (L-1) in the last term of (8) by L simplifies the expression with 
the following upper bound. 


(U)‘ 


ij) (1 ■' M + If) 


(9) 


The particular arrangement of (9) is intentional. The form is extended 
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to the general case in the appendix A. The left most portion, 1<(XI) /2, is 
the probability of the complete test (7). Nex^ the 1/2 term corresponds to 
one over the number of segments. Finally, the right-most term is 1 plus twice 
the fraction of undetected faults. Comparing (9) and (7) we see that the 
segmented test probability is reduced by a factor of (57/92), 


GENERAL CASE 


In appendix A expressions are developed for the mean time to fault 
detection and for the probability of an undetected double error when segmented 
testing is used. Each of the M test segments is assumed to be applied at 
uniform time intervals. The extension to nonuniform application is 
straightforward and could be advantageous in some cases. The forms of these 
expressions allow a segmenting gain to be defined, assuming the same, average 
testing effort for segmented and for nonsegmented test. This gain g(M) is the 
ratio of the probability for the segmented to the nonsegmented case. For both 
mean time to detection and probability of an undetected double error, the same 
gain expression results, 

M-1 _ 

g(M) = M/[l + 2 Z o. ] (10) 

i=l 

The parameter M is the number of t<ist segments and a sub i bar denotes the 
fraction of faults missed by i consecutive test segments averaged over all M 
starting positions. 

If the coverage curve is a straight line, then it is easy to show that 
the gain is always 1. In this case there is no advantage to segmenting. When 
the curve is concave (e.g., where there is a lot of initialization of state 
variables in a sequential network) then the gain is less chan one and 
segmentation makes things worse. Fortunately the convex shape seems much more 
prevalent and Improvement is often possible. 

Examining the gain expression (10) we see that the numerator and 
denominator boi-.h increase with M. Whether an optimum value of M exists 
depends on the coverage curve. Practically speaking one would expect test 
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overhead to eventually become significant as M is increased. In addition, M 
is limited to the number of individual tests in the complete test T, With 
these factors in mind it is instructive to consider the theoretical case where 
the coverage function is an exponential. Let 



When the number of faults times some is less than one we assume that the 
test is complete. Using (11) in (10) and letting i become large will merely 
lower bound the possible gain. Substituting (11) into (10) we obtain a lower 
bound on g(M) , 


M(2 

(2 


K/M 

k7m 


- 1 ) 
+ 1 ) 


( 12 ) 


Supposing that M is a continuous variable and maximizing (12) with respect to 
M we find that the maximum occurs for M very large and that 


Lim _ In 2 

S(M) - 2 


or approximately 

g = 0.34657 K (13) 


we let g denote the unrealizable gain maximum. 

M larger than the number of tests has no meaning. For M ■ K/2 in (12), 


g(K/2) - 0.3 K 


(14) 
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and 


g(K/2) - 0.8656 g. 


For the gain to be half of g we find numerically that 


M 


L 

5.525 * 


Example 2 

Suppose that a network requires L » A8 tests and that the coverage curve 
satisfies (11) with k ■ 12. the limiting value for the gain g is 4.159. Vfhen 
M ■ 6 there are 6 subsets of 8 tests each and the a coeficlents would be 

“i ‘ TO ' 

Any 8 test segment detects 3/4 of the faults. 

The gain from (12) or (14) computes to 

g(6) « 3.6 

Evaluating (10) directly assumeing is zero for 1 > 5 gives 
g(6) - 3.601 


Table 1 lists the gain function for example 2 as a function of M 


M Number 
of Segments 
2 
3 
A 
6 

12 

24 

48 


g(M) Improvement 
Ratio or Gain 
1.998 
2.647 
3.111 
3.601 
4.000 
4.118 
4.149 


Table 1 . Gain for Example 2 
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Case Study 

The se^^mented testing idea was applied to a particular subsystem of SSI 
and MSI logic called the bus guardian unit (BGU). A brief description of this 
subsystem and the testing assumptions are given in appendix B. The unit has a 
complexity of 1296 equivalent gates and 655 packag~ pins. Assuming single pin 
stuck at 1 or 0 faults, a lower bound on the fraction of missed faults was 
made for M»6. From these estimates (Tables Bl) we can compute a lower bound 
on the segmenting gain (expression 10) for 2,3 and 6 segments (Table 2). it 
is felt that these lower bounds are within 10 % of the actual valves for the 


test set considered 



M Number 
of Segments 
2 
3 
6 


g(M) Improvement 
Ratio or Gain 
i.49 
1.79 
2.22 


Table 2. 


Segmenting Gain for the Case Study 


DISCUSSION 


The segmenting of a test set presented may also be applied when a 
processor Is tested by executing self-test code. Segmenting should be 
beneficial whenever the overhead associated with switching to test mode Is 
small and little Inltlallatlon is associated with test subsequences. 

It is possible to specifically design test sets to enhance the gain 
obtained from segmenting. Examples have been constructed where a slightly 
longer test which Is constructed to be segmented yields a lower mean time to 
detection than the shortest test set. In both cases the average number of 
tests per unit time was held constant. Finally it may be possible to design 
networks to maximize the gains from a segmented testing environment. 
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Appendix A 


Expressions for General Coverage 

In this appendix we develop expressions for the probability of a double 
error occuring before the first error is detected, P 2 , and the mean time 
between single fault occurrence and detection MTFD. The fault process is 
assumed to be Poisson with the fault rate X very small. 

Initially suppose that a complete fault detection set T of length L is 
divided into M segments and is applied to a unit under test uniformly spaced 
in time and repeated periodically. The total time for one complete test is 
denoted by I. The fraction of time actually spent applying test patterns is 
assumed to be small. If not, the results are not changed significantly but 
the analysis is considerably more complex. Let Tj denote the jth test 
segment. The total test T is then 

T - Vi - Vi • 

Any rotation of T, e.g.. 


T T 

j j+1 


Vl ^o'^i 


j-1 


(A2) 


is assumed to also be a complete test and to have the same coverage curve as 
T, This is for simplicity of notation and will be relaxed later. Let 
denote the fraction of faults which are undetected by the first j test 
segments. Clearly Oj is nondecreasing in j. Figure Al indicates Oj on a 
typical coverage curve. 
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For a Poisson process the probability of exactly k occurrences in a time 
t is given by 


-At (Xt)'^ 
® k! 


(A3) 


where X is the rate parameter. Consider the system for L complete testings. 

In this total time there are LM intervals between test segments. Consider the 
Instances where a single error in the first interval is undetected before some 
other error occurs. A double error before any testing has probability 


e 


U 

M 




(A4) 


We suppose that XI is very small compared to 1 so that the exponential term 

can be replaced by 1 in the various expressions. Thus a double error in the 

2 2 

first interval has probability given by (XI) /2M . 

A single fault in the first interval which is missed by test segment 
has probability assuming equally likely faults. A single fault in the 

second interval has probability This situation has probability 

2 ” 

(XI)^ 

a. -- . In a similar fashion an error in the first block which is 


undetected in the first j intervals followed by an error in interval j+1 has 

probability a, — s — . 

J M 

In the U1 Intervals the first case occurs LM times, the second occurs LM- 
1 times, and the last occurs LM-J times. Adding all such terms results in the 
probability of two errors in an interval or a single error which is undetected 
before a second error occurs, ?2 


ORIGtNAl 

OF POOR QUALITY 


[H + aj(LM-l) + a2(LM-2) + •• + a^CLM-j) 


+ V .(LM-J)l 
^ j*l J 


,.,^.2 , M-1 


U5) 


j=l 


For L large we can replace (1 - by 1 and approximate this expression by 


UXI)2 1 M-1 

?2 = -■■ » —- ij + 2 

2 M 2 1 


L(XI)2 1 M-1 

^ (1 + 2 Z a.) 
2 M J 


(A6) 


Expressions (A5) and (A6) omit triple and higher order faults since we 
have assumed that XI is very small. The first factor L(XI)2/2 is the 
probability of two errors within an interval I for L intervals. This is the 
case where the total test is applied once in I time units. The factor 1/M is 
the maximum reduction possible for M segments when the coverage curve is very 
steep, i.e., the sum of the is small. The last term is the dependency on 
the shape of the coverage curve. 

Next suppose that a rotation of the test set (A2) gives a different 
coverage curve than the initial order (Al). Up to M different fractional 
probabilities could result. But in our summation leading up to (A5) we can 
replace by the average of the M possibility different fractional 
coefficients, one for each phase of (A2). Thus we define an average 
fractional coefficient , 


M-1 


■ 77 Z fraction of faults undetected by T. T.., •• T , 
“j M i i+1 i+j-1 

i-0 
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A4 


where the subscripts are added modulo M. Using the same procedure as led to 
(A6) we find the probability ?£• 


L(XI)^ 

”1 


M 2 


M-1 

Z 

j = l 



(A7) 


The computation of the mean time between single fault occurrence and 
detection, MTFD, is quite similar to that of P 2 . Suppose initially that any 
rotation of T has the same coverage curve* A fault can occur uniformly within 
the first interval I/M, For those faults that are detected by the mean 
time is just half the interval or I/2M. The fraction of such faults is 
1 “ Oj* The fraction of faults in the first interval is detected by 

Tj^. For this second class the mean time is I/M longer or 31/ 2M. Forming the 
expected value we find 


MTFD = 2 ^ [(i ” “ 1 ^ + 3(a^ - 0 .^) +5(0^ ~ ol ^) + •••] 

II 

= 41 [1 + 2 I a.] . (A 8 ) 

2 M j.l J 

Again the Interpretation of (A8) is similar to that of (A6). The factor 
1/2 is the value expected for no segmenting, the second factor 1/M is the 
maximum conceivable reduction for M segments, and the last factor is the 
coverage curve coefficient. 

As was the case for ?2 (A8) is easily extended to different coverage 
curves for rotations and 


44^+2 Z a J . 

2 M j.l j 


MTFD 


(A9) 
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Appendix B *- Case Study 
Introduction 

The bus guardian unit (BGU) of the fault-tolerant multi-processor (FTMP) 
Is specialized module designed to enable or decouple a bus drive to a system 
bus line. Two BGU's are used for each processor/memory unit. A BGU has an 
equivalent complexity of about 1200 logic gates. In operation, 3 out of 3 bus 
lines are selected on the basis of an Internally stored select code SEL. Each 
selected bus line Is Input to a synchronizing sub-unit called a deskewer. The 
three synchronized bit streams are voted to yield a serial message, if the 
message Is recognized as being addressed to the particular BGU, then one of 3 
storage registers are updated. Four of these registers generate the 20 BGU 
output enable lines while the fifth register contains the select code, SEL. 

Because of the fault-tolerant nature of the communication scheme and the 
limited output visibility, the BGU is an interesting module to test. To 
simplify the discussion we will Ignore the specialized BGU operations 
associated with power-on, master-resest, and power-fail and consider normal 
operation. 

At the highest level there are 3 types of BGU behavior; 

Case 1. Correct response to a valid message, 

Case 2. Failure of the BGU to recognize a valid message. 

Case 3. Change by the BGU when not commanded. 

These last two correspond to a miss or a false alarm respectively. These 
cases require separate tests to detect. The following three facets of the BGU 
add to the testing problem. 


The FTMP communication system Is designed to tolerate single failures, 
hence the three separate serial inputs which are voted. But with three 
correct bus Inputs, many BGU Interval faults are also tolerated when they 
occur prior to voting. BGU voter discrepencies (2 of 3 or 1 of 3) are not 
visible as outputs. This situation requires test inputs which are also 2 of 3 
or 1 of 3 to propagate faults to a visible output. 

Closely related are input selection faults. The 3 of 5 select logic and 
SEL code assignment Interact. Many single bit changes in a SEL code result in 
2 of the three desired bus inputs still selected. Thus many select logic 
faults and SEL register faults are not visible with 3 correct Inputs. Since 
the SEL register contents can only be Infered from other register outputs, a 
series of tests are needed. 

Finally the BGU address decoder utilizes 20 message positions. A single 
stuck position at the correct value can occur in 20 ways, hence 20 tests are 
needed to detect these Case 3 failures. In addition there are other faults 
associated with the BGU timing logic which result in Case 3 behavior which can 
not be overlapped with addressing false a..arm faults. 

To illustrate the segmenting idea we make the following testing 
assumptions. The assumptions 2 and 3 correspond, approximately, to the 
manufacturing test environment. 

1. The fault class is single pin stuck-at 1 or 0 faults. 

2. The 5 system bus lines are available as inputs. Bus inputs can be freely 
chosen. 

3. The 20 enable ouputs are observable. 

A. Fault dectection is the objective. 

The BGU is constructed from 50 digital Integerated circuits (Table Bl), 3 
delay units, two op amp comparitors and some discrete components such as pull- 
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up resistors 1 decoupling capacitors, etc. The unit Is assembled on a single 
circuit board. The digital circuits Include 26 SSI packages, 24 MSI packages, 
and 3 delay units. The SSI accounts for 327 logic pins with 193 equivalent 
gates; the MSI for 328 pins with 1103 equivalent gates. As compared to LSI or 
VLSI the gate to pin ratio is quite small. 
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Examining the design Ln detail there are five key areas which require 
particular test Inputs: 

A. The five output registers and associated tri-state buffers. 

B. The address decoder. 

C. The select code register and 3 of 5 select logic. 

D. The two vucers. 

E. The timing decade logic. 

Area C has ten distinct paths through the select logic which require two 
messages for each path, one to change the SEL code and a second message to 
verify the updated SEL code by changing an observable output. Another five 
messages are required to complete the testing of area A. 

Area B requires 20 test messages for Case 3 (false alarm) failures as 
noted earlier. The BUSY voter in Area D requires 6 additional false alarm 
messages. Finally Area E can be tested foi. Case 3 faults with 7 more 
messages. The total set of 58 messages is sufficient to test for single pin 
s-a faults. It can be shown that at least 54 mt3ssages are necessary. 

The sufficient message set can be divided into 6 segments with each of 
the previously mentioned subsets as evenly as possible yielding 9 or 10 test 
messages per segment. A-' upper bound on the values for this segmented test 
set can be determined by counting the number of fault? that are always 
detected. Exact values could be decormiaed by simulation. From upper bounds 
on the 0 ^ a lower bound on the segmenting gain g(6) expression 10 can be 


found. The follc^'ing table lists these values. 
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i 


a . 

1 


1 

2 

3 

4 

5 


0.29 

0.23 

0.17 

0.11 

0.05 


Table Bl. Miss Fractions for the Case Study 


